Quantum many-body problems are some of the most challenging problems in science and are central to demystifying some exotic quantum phenomena, e.g., high-temperature superconductors. The combination of neural networks (NN) for representing quantum states, coupled with the Variational Monte Carlo (VMC) algorithm, has been shown to be a promising method for solving such problems. However, the run-time of this approach scales quadratically with the number of simulated particles, constraining the practically usable NN to - in machine learning terms - minuscule sizes (<10M parameters). Considering the many breakthroughs brought by extreme NN in the +1B parameters scale to other domains, lifting this constraint could significantly expand the set of quantum systems we can accurately simulate on classical computers, both in size and complexity. We propose a NN architecture called Vector-Quantized Neural Quantum States (VQ-NQS) that utilizes vector-quantization techniques to leverage redundancies in the local-energy calculations of the VMC algorithm - the source of the quadratic scaling. In our preliminary experiments, we demonstrate VQ-NQS ability to reproduce the ground state of the 2D Heisenberg model across various system sizes, while reporting a significant reduction of about ${\times}10$ in the number of FLOPs in the local-energy calculation.
translated by 谷歌翻译
There is increasing adoption of artificial intelligence in drug discovery. However, existing works use machine learning to mainly utilize the chemical structures of molecules yet ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions, and predict complex biological activities. We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecule's chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct the largest multi-modal dataset to date, namely PubChemSTM, with over 280K chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM possesses two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
translated by 谷歌翻译
Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. In this work, we propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization. We jointly investigate tensor contraction path optimizations and a fused Einsum mapping strategy to bridge the gap between theoretical benefits and real hardware efficiency improvement. Our two-stage knowledge distillation flow resolves the trainability bottleneck and thus significantly boosts the final accuracy of factorized Transformers. Overall, we experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss and achieve a better efficiency-accuracy Pareto frontier than hand-tuned and heuristic baselines.
translated by 谷歌翻译
The physics-informed neural operator (PINO) is a machine learning architecture that has shown promising empirical results for learning partial differential equations. PINO uses the Fourier neural operator (FNO) architecture to overcome the optimization challenges often faced by physics-informed neural networks. Since the convolution operator in PINO uses the Fourier series representation, its gradient can be computed exactly on the Fourier space. While Fourier series cannot represent nonperiodic functions, PINO and FNO still have the expressivity to learn nonperiodic problems with Fourier extension via padding. However, computing the Fourier extension in the physics-informed optimization requires solving an ill-conditioned system, resulting in inaccurate derivatives which prevent effective optimization. In this work, we present an architecture that leverages Fourier continuation (FC) to apply the exact gradient method to PINO for nonperiodic problems. This paper investigates three different ways that FC can be incorporated into PINO by testing their performance on a 1D blowup problem. Experiments show that FC-PINO outperforms padded PINO, improving equation loss by several orders of magnitude, and it can accurately capture the third order derivatives of nonsmooth solution functions.
translated by 谷歌翻译
Recently, neural networks have proven their impressive ability to solve partial differential equations (PDEs). Among them, Fourier neural operator (FNO) has shown success in learning solution operators for highly non-linear problems such as turbulence flow. FNO is discretization-invariant, where it can be trained on low-resolution data and generalizes to problems with high-resolution. This property is related to the low-pass filters in FNO, where only a limited number of frequency modes are selected to propagate information. However, it is still a challenge to select an appropriate number of frequency modes and training resolution for different PDEs. Too few frequency modes and low-resolution data hurt generalization, while too many frequency modes and high-resolution data are computationally expensive and lead to over-fitting. To this end, we propose Incremental Fourier Neural Operator (IFNO), which augments both the frequency modes and data resolution incrementally during training. We show that IFNO achieves better generalization (around 15% reduction on testing L2 loss) while reducing the computational cost by 35%, compared to the standard FNO. In addition, we observe that IFNO follows the behavior of implicit regularization in FNO, which explains its excellent generalization ability.
translated by 谷歌翻译
State estimation is important for a variety of tasks, from forecasting to substituting for unmeasured states in feedback controllers. Performing real-time state estimation for PDEs using provably and rapidly converging observers, such as those based on PDE backstepping, is computationally expensive and in many cases prohibitive. We propose a framework for accelerating PDE observer computations using learning-based approaches that are much faster while maintaining accuracy. In particular, we employ the recently-developed Fourier Neural Operator (FNO) to learn the functional mapping from the initial observer state and boundary measurements to the state estimate. By employing backstepping observer gains for previously-designed observers with particular convergence rate guarantees, we provide numerical experiments that evaluate the increased computational efficiency gained with FNO. We consider the state estimation for three benchmark PDE examples motivated by applications: first, for a reaction-diffusion (parabolic) PDE whose state is estimated with an exponential rate of convergence; second, for a parabolic PDE with exact prescribed-time estimation; and, third, for a pair of coupled first-order hyperbolic PDEs that modeling traffic flow density and velocity. The ML-accelerated observers trained on simulation data sets for these PDEs achieves up to three orders of magnitude improvement in computational speed compared to classical methods. This demonstrates the attractiveness of the ML-accelerated observers for real-time state estimation and control.
translated by 谷歌翻译
This report describes the winning solution to the Robust Vision Challenge (RVC) semantic segmentation track at ECCV 2022. Our method adopts the FAN-B-Hybrid model as the encoder and uses SegFormer as the segmentation framework. The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy. All the original labels are projected to a 256-class unified label space, and the model is trained using a cross-entropy loss. Without significant hyperparameter tuning or any specific loss weighting, our solution ranks the first place on all the testing semantic segmentation benchmarks from multiple domains (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, and WildDash 2). The proposed method can serve as a strong baseline for the multi-domain segmentation task and benefit future works. Code will be available at https://github.com/lambert-x/RVC_Segmentation.
translated by 谷歌翻译
轨迹预测对于自动驾驶汽车(AV)是必不可少的,以计划正确且安全的驾驶行为。尽管许多先前的作品旨在达到更高的预测准确性,但很少有人研究其方法的对抗性鲁棒性。为了弥合这一差距,我们建议研究数据驱动的轨迹预测系统的对抗性鲁棒性。我们设计了一个基于优化的对抗攻击框架,该框架利用精心设计的可区分动态模型来生成逼真的对抗轨迹。从经验上讲,我们基于最先进的预测模型的对抗性鲁棒性,并表明我们的攻击使通用指标和计划感知指标的预测错误增加了50%以上和37%。我们还表明,我们的攻击可以导致AV在模拟中驶离道路或碰撞到其他车辆中。最后,我们演示了如何使用对抗训练计划来减轻对抗性攻击。
translated by 谷歌翻译
预训练的视觉模型(例如,剪辑)在许多下游任务中显示出有希望的零弹性概括,并具有正确设计的文本提示。最近的作品不依赖手工设计的提示,而是使用下游任务的培训数据来学习提示。虽然有效,但针对领域数据的培训却降低了模型的概括能力,使其无法看到新领域。在这项工作中,我们提出了测试时间提示调整(TPT),该方法可以通过单个测试样本即时学习自适应提示。对于图像分类,TPT通过使用置信度选择最小化熵来优化提示,以便模型在每个测试样本的不同增强视图上都具有一致的预测。在评估对自然分布变化的概括时,TPT平均将零击的TOP-1精度提高了3.6%,超过了先前需要其他特定于任务的训练数据的迅速调整方法。在评估看不见类别的跨数据集泛化时,TPT与使用其他培训数据的最先进方法相当。项目页面:https://azshue.github.io/tpt。
translated by 谷歌翻译
通过生成模型生成具有特定化学和生物学特性的新分子已成为药物发现的有希望的方向。但是,现有的方法需要大型数据集进行广泛的培训/微调,在现实世界中通常无法使用。在这项工作中,我们提出了一个新的基于检索的框架,用于可控分子生成。我们使用一系列的示例分子,即(部分)满足设计标准的分子,以引导预先训练的生成模型转向满足给定设计标准的合成分子。我们设计了一种检索机制,该机制将示例分子与输入分子融合在一起,该分子受到一个新的自我监督目标训练,该目标可以预测输入分子的最近邻居。我们还提出了一个迭代改进过程,以动态更新生成的分子和检索数据库,以更好地泛化。我们的方法不可知生成模型,不需要特定于任务的微调。关于从简单设计标准到设计与SARS-COV-2主蛋白酶结合的铅化合物的具有挑战性的现实世界情景的各种任务,我们证明了我们的方法外推出了远远超出检索数据库,并且比检索数据库更高,并且比更高的性能和更广泛的适用性以前的方法。
translated by 谷歌翻译